The Iterated Lasso for High-Dimensional Logistic Regression
نویسنده
چکیده
We consider an iterated Lasso approach for variable selection and estimation in sparse, high-dimensional logistic regression models. In this approach, we use the Lasso (Tibshirani 1996) to obtain an initial estimator and reduce the dimension of the model. We then use the Lasso as the initial estimator in the adaptive Lasso (Zou 2006) to obtain the final selection and estimation results. We provide conditions under which this two-step approach possesses asymptotic oracle selection and estimation properties. One important aspect of our results is that the total number of covariates can be larger than the sample size. Simulation studies indicate that the iterated Lasso has superior performance in variable selection relative to the standard Lasso. A data example is used to illustrate the proposed approach.
منابع مشابه
Penalized logistic regression with the adaptive LASSO for gene selection in high-dimensional cancer classification
An important application of DNAmicroarray data is cancer classification. Because of the high-dimensionality problem of microarray data, gene selection approaches are often employed to support the expert systems in diagnostic capability of cancer with high classification accuracy. Penalized logistic regression using the least absolute shrinkage and selection operator (LASSO) is one of the key st...
متن کاملNon-asymptotic Oracle Inequalities for the Lasso and Group Lasso in high dimensional logistic model
We consider the problem of estimating a function f0 in logistic regression model. We propose to estimate this function f0 by a sparse approximation build as a linear combination of elements of a given dictionary of p functions. This sparse approximation is selected by the Lasso or Group Lasso procedure. In this context, we state non asymptotic oracle inequalities for Lasso and Group Lasso under...
متن کاملHigh - Dimensional Generalized Linear Models and the Lasso
We consider high-dimensional generalized linear models with Lipschitz loss functions, and prove a nonasymptotic oracle inequality for the empirical risk minimizer with Lasso penalty. The penalty is based on the coefficients in the linear predictor, after normalization with the empirical norm. The examples include logistic regression, density estimation and classification with hinge loss. Least ...
متن کاملRobust high-dimensional semiparametric regression using optimized differencing method applied to the vitamin B2 production data
Background and purpose: By evolving science, knowledge, and technology, we deal with high-dimensional data in which the number of predictors may considerably exceed the sample size. The main problems with high-dimensional data are the estimation of the coefficients and interpretation. For high-dimension problems, classical methods are not reliable because of a large number of predictor variable...
متن کاملPenalized Lasso Methods in Health Data: application to trauma and influenza data of Kerman
Background: Two main issues that challenge model building are number of Events Per Variable and multicollinearity among exploratory variables. Our aim is to review statistical methods that tackle these issues with emphasize on penalized Lasso regression model. The present study aimed to explain problems of traditional regressions due to small sample size and m...
متن کامل